Projektarbeit des Data-Analytics-Kurs / Dozentin : Beatrice Schwab / Teilnehmerin: Ngoc Phuong Thao NGUYEN / Datum : 19.05.2023
Du arbeitest als Data Scientist bei der P-2-P-Plattform https://www.kiva.org/, die vor einem Jahr gegründet wurde. Nun wollt ihr euer Geschäft erweitern. Euer Team hat sich aufgeteilt und jeder Analyst hat einen Teilbereich der Daten. Deine Aufgabe ist es in einer explorativen Datenanalyse Insights für eure Plattform herauszufinden.
Euer Geschäftsmodell ist das Betreiben einer Plattform (crowd-investing) bei der sich Personen die eine Geschäftsidee haben, aber nicht das benötigte Geld, anmelden und für ihr Projekt innerhalb einer vorgegebenen Zeit Geld sammeln können.
Auf der anderen Seite habt ihr Geldgeber, die gern ihr Geld in Projekte anlegen möchten und nach Investitionen suchen.
Als Vermittler bringt eure Plattform also Geldnehmer und Geldgeber zusammen.
Deine Datenbasis ist die Historie eurer Plattform.
Zusätzliche Annahmen über das Geschäftsmodell
Alle Projekte sind abgeschlossene Projekte, d.h. die Zeit, um für sein Projekt Geld zu sammeln ist abgelaufen. Euer Geschäftsmodell sieht es vor, dass die gesammelten Gelder ausgezahlt werden, auch wenn der Zielbetrag nicht erreicht wurde.
Ihr verdient euer Geld mit einer Provision für jedes Projekt was auf eurer Plattform landet.
Der Geldgeber erhält einen Zins für die Geldleihe.
Bevor mit der Analyse losgelegt werden kann, ist das Ziel erst einmal zu bestimmen.
Welche Informationen können für das Unternehmen aus den Daten generiert werden?
Wofür stehen die Spalten?
# benötigte Bibliotheken importieren
# für Daten
import numpy as np
import pandas as pd
# für Grafik
import plotly.express as px
Identifizierung des Trennzeichens
# Datensatz einlesen - erstmal mit /n als sep, um das Trennzeichen zu erkennen
df_input = pd.read_csv("data_abschlussprojekt.csv", sep="/n", engine='python', nrows=2)
df_input
# Trennzeichen ist # und erste Spalte ist index
| # funded_amount# loan_amount# activity# sector# use# country_code# country# region# currency# term_in_months# lender_count# borrower_genders# repayment_interval | |
|---|---|
| 0 | 0#300.0#300.0#Fruits & Vegetables#Food#To buy ... |
| 1 | 1#575.0#575.0#Rickshaw#Transportation#to repai... |
Daten mit korrektem Trennzeichen einlesen
# Daten einlesen mit dem korrekten Trennzeichen
df_kiva = pd.read_csv("data_abschlussprojekt.csv", sep="#", index_col=0)
df_kiva.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12.0 | 12 | female | irregular |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11.0 | 14 | female, female | irregular |
# Spalten anzeigen
df_kiva.columns
Index([' funded_amount', ' loan_amount', ' activity', ' sector', ' use',
' country_code', ' country', ' region', ' currency', ' term_in_months',
' lender_count', ' borrower_genders', ' repayment_interval'],
dtype='object')
# Leerzeichen entfernen in den Spalten
df_kiva.columns = df_kiva.columns.str.strip()
# check, ob Leerzeichen entfernt sind
df_kiva.columns
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'region', 'currency', 'term_in_months',
'lender_count', 'borrower_genders', 'repayment_interval'],
dtype='object')
# Datentypen
df_kiva.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months float64 lender_count int64 borrower_genders object repayment_interval object dtype: object
Auswertung Datentypen
--> term_in_months: muss inhaltlich ein int sein --> technischer Datentyp ist ein float --> soll int verwenden --> activity, sector, use, country_code, country, region, currency, borrower_genders, repayment_interval: sind Objeckt Datentyp --> bei fehlenden Werten näher anschauen
# kleine statistische Auswertung
df_kiva.describe()
| funded_amount | loan_amount | term_in_months | lender_count | |
|---|---|---|---|---|
| count | 671205.000000 | 671205.000000 | 671205.000000 | 671205.000000 |
| mean | 785.995061 | 842.397107 | 13.739022 | 20.590922 |
| std | 1130.398941 | 1198.660073 | 8.598919 | 28.459551 |
| min | 0.000000 | 25.000000 | 1.000000 | 0.000000 |
| 25% | 250.000000 | 275.000000 | 8.000000 | 7.000000 |
| 50% | 450.000000 | 500.000000 | 13.000000 | 13.000000 |
| 75% | 900.000000 | 1000.000000 | 14.000000 | 24.000000 |
| max | 100000.000000 | 100000.000000 | 158.000000 | 2986.000000 |
Auswertung
min()-Werte sehen ok aus.
In der Spalte term_in_month/ lender_count liegt ein Wert bei 158/ 2986. Dieser Wert scheint außerhalb des Wertebreiches zu liegen und wird später hinsichtlich seiner Plausibilität überprüft.
Es wurden 69,349 Duplikate gefunden, bei denen alle Spalteneinträge übereinstimmen. Es ist jedoch wichtig zu beachten, dass diese Duplikate nicht unbedingt dasselbe Projekt repräsentieren. Es könnte sein, dass erfolgreiche Projekte in einer Region Nachahmer gefunden haben und diese Nachahmer möglicherweise Unterstützung bei der Erstellung ihrer Projekte erhalten haben. In solchen Fällen ist es wahrscheinlich, dass die Kurzbeschreibungen der Projekte ebenfalls identisch sind.
Laut unserer Grundannahme handelt es sich um ein Projekt, das mehrfach in der Datenbank auftritt, wenn alle Zahlungsinformationen, das Zielland und die Projektinformationen übereinstimmen. Ein eindeutiger Indikator dafür ist, wenn der individuelle Verwendungszweck ('use') identisch ist. Allerdings wird der individuelle Verwendungszweck nicht immer angegeben, daher muss zusätzlich überprüft werden, ob 'use' gleich NaN ist.
df_kiva.loc[df_kiva.duplicated(),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 498 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | NaN | USD | 14.0 | 4 | male | monthly |
| 606 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | NaN | USD | 14.0 | 4 | male | monthly |
| 808 | 450.0 | 450.0 | Higher education costs | Education | to pay for one semester's registration fees. | CO | Colombia | Bogotà | COP | 7.0 | 15 | female | monthly |
| 1703 | 500.0 | 500.0 | Higher education costs | Education | To buy a laptop for educational purposes. | SO | Somalia | Hargeisa | USD | 8.0 | 19 | male | monthly |
| 2317 | 250.0 | 250.0 | Poultry | Agriculture | to purchase poultry. | KE | Kenya | Ndaragwa | KES | 16.0 | 10 | female | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
24372 rows × 13 columns
df_kiva_copy = df_kiva.copy()
columns_to_check = ['funded_amount','loan_amount','activity','sector','country_code','term_in_months','lender_count']
duplicate_rows = df_kiva_copy[df_kiva_copy.duplicated(subset=columns_to_check,keep=False)]
use_nan_mask = duplicate_rows['use'].isna()
duplicate_rows = duplicate_rows[use_nan_mask]
columns_to_check.append('use')
final_duplicates = df_kiva_copy[df_kiva_copy.duplicated(subset=columns_to_check,keep=False)]
final_duplicates
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 51 | 175.0 | 175.0 | Pigs | Agriculture | to buy piglets and feed | PH | Philippines | Liloy-Dela Paz | PHP | 8.0 | 6 | female | irregular |
| 52 | 175.0 | 175.0 | Pigs | Agriculture | to purchase feed and vitamins for her pigs | PH | Philippines | Tanjay, Negros Oriental | PHP | 8.0 | 7 | female | irregular |
| 62 | 4275.0 | 4275.0 | Personal Housing Expenses | Housing | to pay for the construction of a hygienic toil... | VN | Vietnam | Hai Duong | VND | 14.0 | 144 | female, female, female, female, female, female... | bullet |
| 191 | 225.0 | 225.0 | Fish Selling | Food | to buy more fish to sell | PH | Philippines | Minglanilla, Cebu | PHP | 11.0 | 9 | female | irregular |
| 237 | 4275.0 | 4275.0 | Personal Housing Expenses | Housing | to pay for the construction of a hygienic toil... | VN | Vietnam | Hai Duong | VND | 14.0 | 144 | female, female, female, female, female, male, ... | bullet |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
69349 rows × 13 columns
Aus diesem obengenannten Grund werden die gefundenen Duplikate nicht gelöscht, sondern im Datensatz für die folgenden Analysen beibehalten.
Aktuell haben wir etwa 10% der Daten als Duplikate identifiziert.
# Identifizierung über Datentyp - abgleich datentyp mit Spalteninhalt
df_kiva.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months float64 lender_count int64 borrower_genders object repayment_interval object dtype: object
Spalte inhaltlicher Datentyp technischer Datentyp Erkenntnis
funded_amount float float64 ok
loan_amount float float64 ok
activity object object ok
sector object object ok
use object object ok
country_code object object ok
country object object ok
region object object ok
currency object object ok
term_in_months float float64 soll datentyp in int verwenden?
lender_count int int64 ok
borrower_genders object object näher anschauen
repayment_interval object object ok
# prüfen, welcher fehlende Werte in df_kiva sind:
df_kiva.isnull().sum()
funded_amount 0 loan_amount 0 activity 0 sector 0 use 4232 country_code 8 country 0 region 56800 currency 0 term_in_months 0 lender_count 0 borrower_genders 4221 repayment_interval 0 dtype: int64
df_kiva.isnull().mean() * 100
funded_amount 0.000000 loan_amount 0.000000 activity 0.000000 sector 0.000000 use 0.630508 country_code 0.001192 country 0.000000 region 8.462392 currency 0.000000 term_in_months 0.000000 lender_count 0.000000 borrower_genders 0.628869 repayment_interval 0.000000 dtype: float64
# Prüfen in df_kiva welche fehlende Werte gibt?
df_kiva[df_kiva.isna().any(axis=1)]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 250.0 | 250.0 | Services | Services | purchase leather for my business using ksh 20000. | KE | Kenya | NaN | KES | 4.0 | 6 | female | irregular |
| 49 | 450.0 | 450.0 | General Store | Retail | to stock his store. | SV | El Salvador | NaN | USD | 14.0 | 18 | male | monthly |
| 54 | 225.0 | 225.0 | Food Market | Food | to purchase various seasonal items to resell: ... | SN | Senegal | NaN | XOF | 14.0 | 7 | female | monthly |
| 67 | 125.0 | 125.0 | Energy | Services | purchase solar lanterns for resale. | KE | Kenya | NaN | KES | 3.0 | 6 | male | irregular |
| 70 | 2000.0 | 2000.0 | Retail | Retail | to install a display window and a sunshade for... | IQ | Iraq | NaN | USD | 15.0 | 71 | male | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671197 | 0.0 | 25.0 | Livestock | Agriculture | Pretend the issue with loan got addressed by K... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
56813 rows × 13 columns
df_kiva.loc[df_kiva.loc[:, 'use'].isna(), :]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | NaN | PE | Peru | NaN | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | NaN | SV | El Salvador | NaN | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
4232 rows × 13 columns
df_kiva['use'].unique()
array(['To buy seasonal, fresh fruits to sell. ',
'to repair and maintain the auto rickshaw used in their business.',
'To repair their old cycle-van and buy another one to rent out as a source of income',
...,
'Pretend the issue with loan got addressed by Kiva Coordinator.',
'Kiva Coordinator replaced loan use. Should see this in viewdiff.',
'Edited loan use in english.'], dtype=object)
# prüfen, welcher fehlende wert in country_code ist
# 8
print(df_kiva.loc[df_kiva.loc[:, 'country_code'].isna(), :].shape)
df_kiva.loc[df_kiva.loc[:, 'country_code'].isna(), :]
(8, 13)
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
df_kiva['country_code'].unique()
array(['PK', 'IN', 'KE', 'NI', 'SV', 'TZ', 'PH', 'PE', 'SN', 'KH', 'LR',
'VN', 'IQ', 'HN', 'PS', 'MN', 'US', 'ML', 'CO', 'TJ', 'GT', 'EC',
'BO', 'YE', 'GH', 'SL', 'HT', 'CL', 'JO', 'UG', 'BI', 'BF', 'TL',
'ID', 'GE', 'UA', 'XK', 'AL', 'CD', 'CR', 'SO', 'ZW', 'CM', 'TR',
'AZ', 'DO', 'BR', 'MX', 'KG', 'AM', 'PY', 'LB', 'WS', 'IL', 'RW',
'ZM', 'NP', 'CG', 'MZ', 'ZA', 'TG', 'BJ', 'BZ', 'SR', 'TH', 'NG',
'MR', 'VU', 'PA', 'VI', 'VC', 'LA', 'MW', 'MM', 'MD', 'SS', 'SB',
'CN', 'EG', 'GU', 'AF', 'MG', nan, 'PR', 'LS', 'CI', 'BT'],
dtype=object)
# Überprüfen of country "Namibia" schon ein country_code definiert?
# Falls nicht, ISO-2 von Namibia und replace to "NA"
df_kiva.loc[df_kiva["country"]=="Namibia", :]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
df_kiva['region'].unique().shape # 12696
df_kiva['region'].unique()
array(['Lahore', 'Maynaguri', 'Abdul Hakeem', ..., 'Gbenikoro Village',
'Morimaraia', 'alejandria'], dtype=object)
# # prüfen, welcher fehlende wert in region ist
df_kiva.loc[df_kiva.loc[:, 'region'].isna(), :] # 56800
df_region = df_kiva.loc[df_kiva.loc[:, 'region'].isna(), ['sector','activity','country','region']]
df_region
| sector | activity | country | region | |
|---|---|---|---|---|
| 5 | Services | Services | Kenya | NaN |
| 49 | Retail | General Store | El Salvador | NaN |
| 54 | Food | Food Market | Senegal | NaN |
| 67 | Services | Energy | Kenya | NaN |
| 70 | Retail | Retail | Iraq | NaN |
| ... | ... | ... | ... | ... |
| 671197 | Agriculture | Livestock | Kenya | NaN |
| 671201 | Agriculture | Livestock | Kenya | NaN |
| 671202 | Entertainment | Games | Kenya | NaN |
| 671203 | Agriculture | Livestock | Kenya | NaN |
| 671204 | Agriculture | Livestock | Kenya | NaN |
56800 rows × 4 columns
# prüfen, welcher fehlende werte in borrower_genders ist
# term_in_months sind nur ganzzahlig --> muss int werden
print(df_kiva.loc[:,'borrower_genders'].unique().shape) # 11299 differents values
df_kiva.loc[df_kiva.loc[:, 'borrower_genders'].isna(), :]
(11299,)
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | NaN | PE | Peru | NaN | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | NaN | SV | El Salvador | NaN | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
4221 rows × 13 columns
# prüfen, welcher fehlende wert in use ist
# total 4232
df_kiva.loc[df_kiva.loc[:, 'use'].isna(),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | NaN | PE | Peru | NaN | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | NaN | SV | El Salvador | NaN | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
4232 rows × 13 columns
# Replace NaN Werte als Unknown
df_kiva['use'].fillna('Unknown', inplace=True)
# Überprüft das Ergebnisse
#df_kiva.loc[df_kiva.loc[:, 'use'].isna(), :]
df_kiva['use'].isnull().sum()
0
Hypothese: Country_code hängt mit Country zusammen --> Spalten müssen zusammen betrachet werden
Überprüfung der Hypothese: sind meine fehlenden Werte in Country_code auch diejenigen die kein Country haben?
Möglichkeit 1: Data dictionary --> bestätigt den Zusammenhang "NA" verwenden
Möglichkeit 2: technische Überprüfung
Ländercode nach ISO-2 Norm
df_kiva.loc[df_kiva.loc[:, 'country_code'].isna(), :] # 8 Zeilen
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
# df_kiva.loc[df_kiva.loc[:, 'country_code'].isna(), :].replace(to_replace="country_code",value="NA", inplace=True)
df_kiva.loc[df_kiva['country_code'].isna(), 'country_code'] = 'NA'
# df_ecar.replace(to_replace={"Seats":signs}, value=np.nan, inplace=True)
df_kiva.loc[df_kiva.loc[:, 'country_code'].isna(),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval |
|---|
df_kiva['country_code'].isnull().sum()
0
df_kiva.loc[:, 'country_code'].isna()
0 False
1 False
2 False
3 False
4 False
...
671200 False
671201 False
671202 False
671203 False
671204 False
Name: country_code, Length: 671205, dtype: bool
Hypothese: region hängt mit Country zusammen --> Spalten müssen zusammen betrachet werden
Überprüfung der Hypothese: sind meine fehlenden Werte in region auch diejenigen die kein Country haben?
Möglichkeit : Data dictionary --> bestätigt den Zusammenhang "Unknown" verwenden
print(df_kiva["country"].unique().shape) # 87 countries in df_kiva
df_kiva.loc[df_kiva.loc[:, 'region'].isna(),:]
(87,)
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 250.0 | 250.0 | Services | Services | purchase leather for my business using ksh 20000. | KE | Kenya | NaN | KES | 4.0 | 6 | female | irregular |
| 49 | 450.0 | 450.0 | General Store | Retail | to stock his store. | SV | El Salvador | NaN | USD | 14.0 | 18 | male | monthly |
| 54 | 225.0 | 225.0 | Food Market | Food | to purchase various seasonal items to resell: ... | SN | Senegal | NaN | XOF | 14.0 | 7 | female | monthly |
| 67 | 125.0 | 125.0 | Energy | Services | purchase solar lanterns for resale. | KE | Kenya | NaN | KES | 3.0 | 6 | male | irregular |
| 70 | 2000.0 | 2000.0 | Retail | Retail | to install a display window and a sunshade for... | IQ | Iraq | NaN | USD | 15.0 | 71 | male | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671197 | 0.0 | 25.0 | Livestock | Agriculture | Pretend the issue with loan got addressed by K... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
56800 rows × 13 columns
# Replace alle NaN Werte mit 'Unknown'
df_kiva['region'].fillna('Unknown', inplace=True)
# Überprüfen noch mal das Ergebnisse
#df_kiva.loc[df_kiva.loc[:, 'region'].isna(),:]
df_kiva['region'].isnull().sum()
0
Geschlecht und Anzahl der Darlehensnehmer, also diejenigen die das Crowdprojekt initiiert haben
Wir werden "use" für dieses Zweck behandeln.
- "Unknown" : für alle fehlende Werte
df_kiva.loc[df_kiva["borrower_genders"].isna(),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | Unknown | TZ | Tanzania | Unknown | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | Unknown | PE | Peru | Unknown | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | Unknown | TZ | Tanzania | Unknown | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | Unknown | TZ | Tanzania | Unknown | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | Unknown | SV | El Salvador | Unknown | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | Unknown | KE | Kenya | Unknown | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13.0 | 0 | NaN | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | Unknown | KE | Kenya | Unknown | KES | 13.0 | 0 | NaN | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | Unknown | KE | Kenya | Unknown | KES | 13.0 | 0 | NaN | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13.0 | 0 | NaN | monthly |
4221 rows × 13 columns
# Replace alle NaN Werte mit 'Unknown'
df_kiva['borrower_genders'].fillna('Unknown', inplace=True)
# Überprüfen nochmal das Ergebnisse
#df_kiva.loc[df_kiva["borrower_genders"].isna(),:]
df_kiva['borrower_genders'].isnull().sum()
0
df_kiva.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months float64 lender_count int64 borrower_genders object repayment_interval object dtype: object
df_kiva.memory_usage(deep=True)
Index 5369640 funded_amount 5369640 loan_amount 5369640 activity 46521289 sector 43603755 use 78992889 country_code 39601095 country 43923974 region 47895732 currency 40272300 term_in_months 5369640 lender_count 5369640 borrower_genders 47174152 repayment_interval 43400106 dtype: int64
# umwandlung von float in den int
df_kiva['term_in_months'] = df_kiva['term_in_months'].astype(int)
df_kiva.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months int32 lender_count int64 borrower_genders object repayment_interval object dtype: object
df_kiva['term_in_months'].unique()
array([ 12, 11, 43, 14, 4, 13, 10, 8, 5, 20, 7, 3, 17,
15, 9, 39, 23, 26, 6, 30, 22, 25, 27, 16, 52, 18,
36, 19, 28, 62, 32, 24, 21, 50, 38, 2, 35, 72, 137,
1, 49, 33, 42, 29, 37, 51, 113, 79, 31, 44, 74, 34,
48, 114, 40, 124, 104, 63, 85, 78, 70, 86, 61, 60, 67,
55, 53, 41, 68, 143, 77, 130, 45, 111, 134, 107, 142, 148,
56, 122, 133, 141, 110, 81, 106, 54, 147, 112, 59, 145, 121,
109, 80, 47, 97, 75, 101, 128, 98, 87, 71, 66, 46, 125,
76, 73, 120, 144, 118, 131, 65, 108, 58, 123, 84, 99, 82,
92, 69, 91, 57, 90, 93, 129, 89, 88, 64, 126, 138, 158,
83, 100, 105, 132, 96, 127, 135, 95, 154, 156, 94, 115, 102,
116, 136, 103, 139, 146])
df_kiva.memory_usage(deep=True)
Index 5369640 funded_amount 5369640 loan_amount 5369640 activity 46521289 sector 43603755 use 78992889 country_code 39601095 country 43923974 region 47895732 currency 40272300 term_in_months 2684820 lender_count 5369640 borrower_genders 47174152 repayment_interval 43400106 dtype: int64
Achtung: funktioniert hier nur, wenn wir einen kleinen Datensatz haben
# benötigte bib
import seaborn as sns
sns.pairplot(data=df_kiva,corner=True)
<seaborn.axisgrid.PairGrid at 0x1ce8c54d4b0>
Weil unsere Daten 671,205 Datensätze umfassen, funktioniert das Pairlot nicht.
Somit stellen Ausreißer keine Ausreißer im statistischen Sinne dar, sondern sind als Extremwerte einzustufen und somit beizubehalten.
Wie kann ich kategorische Daten hinsichtlich auf Ausreißer überprüfen?
df_agg = df_kiva.groupby("sector").size()
df_agg
sector Agriculture 180302 Arts 12060 Clothing 32742 Construction 6268 Education 31013 Entertainment 830 Food 136657 Health 9223 Housing 33731 Manufacturing 6208 Personal Use 36385 Retail 124494 Services 45140 Transportation 15518 Wholesale 634 dtype: int64
df_agg1 = df_kiva.groupby("activity").size()
df_agg1
activity
Adult Care 2
Agriculture 27023
Air Conditioning 36
Animal Sales 9237
Aquaculture 108
...
Water Distribution 564
Weaving 2961
Wedding Expenses 405
Well digging 42
Wholesale 357
Length: 163, dtype: int64
sns.barplot(x=df_agg, y=df_agg.index)
<Axes: ylabel='sector'>
Auswertung
Z.B: Identifizierung von Ausreißern in der term_in_months Spalte
sns.boxplot(x="term_in_months", data=df_kiva)
<Axes: xlabel='term_in_months'>
Auswertung:
Hier würden term_in_months über 100 Monate detaliert anschauen.
Würde ich diesen Datenpunkt nicht entfernen, weil die Projekte wirklich existiert und vielleicht abgeschlossen
# df_kiva.loc[df_kiva["term_in_months"]>100,:]
Eine borrower_genders aus der z.B. die Total Anzahl des genders und auch einzehl Genders extrahiert werden könntne, existiert nicht in dem Datensatz.
Anzahl = []
bg_male = []
bg_female = []
for i in df_kiva.loc[:,'borrower_genders']:
gender_list = str(i).split(', ')
female_count = gender_list.count('female')
male_count = gender_list.count('male')
bg_male.append(male_count)
bg_female.append(female_count)
total_num = female_count + male_count
Anzahl.append(total_num)
#print("Anzahl:", total_num)
df_kiva_gender = df_kiva.copy()
Anzahl_Genders = "Anzahl_Genders"
bg_male1 = "bg_male"
bg_female1 = "bg_female"
df_kiva_gender[Anzahl_Genders] = Anzahl
df_kiva_gender[bg_female1] = bg_female
df_kiva_gender[bg_male1] = bg_male
df_kiva_gender
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43 | 6 | female | bullet | 1 | 1 | 0 |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11 | 8 | female | irregular | 1 | 1 | 0 |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14 | 16 | female | monthly | 1 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13 | 0 | female | monthly | 1 | 1 | 0 |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 1 | female | monthly | 1 | 1 | 0 |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13 | 0 | Unknown | monthly | 0 | 0 | 0 |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 |
671205 rows × 16 columns
Ziel: kann ich Spalten kombinieren um einen neuen Informationsgehalt zu gewinnen?
bezogen auf den Datensatz:
# Zeigen (%)Erfolgsprozentsatz = (Erreichter Betrag / Zielbetrag) * 100
prozent = 0
success_fund = []
for index, row in df_kiva_gender.iterrows():
funded_amount = row['funded_amount']
loan_amount = row['loan_amount']
if loan_amount != 0:
prozent = (funded_amount / loan_amount) * 100
success_fund.append(prozent)
df_success_fund = df_kiva_gender.copy()
success_fund1 = "success_fund"
df_success_fund[success_fund1] = success_fund
df_success_fund
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | success_fund | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 | 100.0 |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 | 100.0 |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43 | 6 | female | bullet | 1 | 1 | 0 | 100.0 |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11 | 8 | female | irregular | 1 | 1 | 0 | 100.0 |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14 | 16 | female | monthly | 1 | 1 | 0 | 100.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 1 | female | monthly | 1 | 1 | 0 | 100.0 |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13 | 0 | Unknown | monthly | 0 | 0 | 0 | 0.0 |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 |
671205 rows × 17 columns
df_success_fund["success_fund"].unique()
array([100. , 85.5 , 80.20833333, ..., 52.21238938,
49.66442953, 8.97435897])
df_final = df_success_fund.copy()
country_mapping = {
'AF': 'Africa',
'AL': 'Europe',
'AM': 'Asia',
'AZ': 'Asia',
'BF': 'Africa',
'BI': 'Africa',
'BJ': 'Africa',
'BO': 'South America',
'BR': 'South America',
'BT': 'Asia',
'BZ': 'North America',
'CD': 'Africa',
'CG': 'Africa',
'CI': 'Africa',
'CL': 'South America',
'CM': 'Africa',
'CN': 'Asia',
'CO': 'South America',
'CR': 'North America',
'DO': 'North America',
'EC': 'South America',
'EG': 'Africa',
'GE': 'Asia',
'GH': 'Africa',
'GT': 'South America',
'GU': 'Oceania',
'HN': 'North America',
'HT': 'North America',
'ID': 'Asia',
'IL': 'Asia',
'IN': 'Asia',
'IQ': 'Asia',
'JO': 'Asia',
'KE': 'Africa',
'KG': 'Asia',
'KH': 'Asia',
'LA': 'Asia',
'LB': 'Asia',
'LR': 'Africa',
'LS': 'Africa',
'MD': 'Europe',
'MG': 'Africa',
'ML': 'Africa',
'MM': 'Asia',
'MN': 'Asia',
'MR': 'Africa',
'MW': 'Africa',
'MX': 'North America',
'MZ': 'Africa',
'NA': 'Africa',
'NG': 'Africa',
'NI': 'North America',
'NP': 'Asia',
'PA': 'North America',
'PE': 'South America',
'PH': 'Asia',
'PK': 'Asia',
'PR': 'North America',
'PS': 'Asia',
'PY': 'South America',
'RW': 'Africa',
'SB': 'Oceania',
'SL': 'Africa',
'SN': 'Africa',
'SO': 'Africa',
'SR': 'South America',
'SS': 'Africa',
'SV': 'North America',
'TG': 'Africa',
'TH': 'Asia',
'TJ': 'Asia',
'TL': 'Asia',
'TR': 'Asia',
'TZ': 'Africa' ,
'UA': 'Europe',
'UG': 'Africa',
'US': 'North America',
'VC': 'North America',
'VI': 'North America',
'VN': 'Asia',
'VU': 'Oceania',
'WS': 'Oceania',
'XK': 'Europe',
'YE': 'Asia' ,
'ZA': 'Africa',
'ZM': 'Africa',
'ZW': 'Africa'
}
df_final.loc[:,"continent"] = df_final.loc[:,"country_code"].map(country_mapping)
df_final.head()
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | success_fund | continent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 | 100.0 | Asia |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 | 100.0 | Asia |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43 | 6 | female | bullet | 1 | 1 | 0 | 100.0 | Asia |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11 | 8 | female | irregular | 1 | 1 | 0 | 100.0 | Asia |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14 | 16 | female | monthly | 1 | 1 | 0 | 100.0 | Asia |
# sind Datentypen passend?
df_final.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months int32 lender_count int64 borrower_genders object repayment_interval object Anzahl_Genders int64 bg_female int64 bg_male int64 success_fund float64 continent object dtype: object
# über nunique() die anzahl der unique werte herausfinden, um herauszufinden, ob der category datentyp Sinn macht
liste = df_final.select_dtypes("object").columns
for spalte in liste:
print(spalte, df_final.loc[:,spalte].nunique())
activity 163 sector 15 use 424913 country_code 87 country 87 region 12696 currency 67 borrower_genders 11299 repayment_interval 4 continent 6
Auswertung:
Alle Spalte mit Objekt Datentyp könnten eine Umwandlung in den Category Datentyp Sinn verwenden. Weil: Speichereffizienz, Verbesserte Performance, Reduzierte Speicherbelastung und Vereinfachte Datenanalyse.
# Umwandlung in Category Datentyp - mit astype()
df_kiva_opti = df_final.copy()
liste_category = ['activity','sector','use','country_code','country','region','currency','borrower_genders','repayment_interval','continent']
for spalte in liste_category:
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
# check, ob Umwandlung erfolgreich
df_kiva_opti.dtypes
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\2420383145.py:8: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
df_kiva_opti.loc[:,spalte] = df_final.loc[:,spalte].astype('category')
funded_amount float64 loan_amount float64 activity category sector category use category country_code category country category region category currency category term_in_months int32 lender_count int64 borrower_genders category repayment_interval category Anzahl_Genders int64 bg_female int64 bg_male int64 success_fund float64 continent category dtype: object
df_kiva_opti.memory_usage(deep=True)
Index 5369640 funded_amount 5369640 loan_amount 5369640 activity 1357871 sector 672752 use 72813831 country_code 678442 country 679029 region 2911520 currency 677329 term_in_months 2684820 lender_count 5369640 borrower_genders 3539048 repayment_interval 671633 Anzahl_Genders 5369640 bg_female 5369640 bg_male 5369640 success_fund 5369640 continent 671768 dtype: int64
# Optimierung des Speicherplatzes bei den floats
df_kiva_opti.loc[:,["funded_amount","loan_amount","success_fund"]] = df_kiva_opti[["funded_amount","loan_amount","success_fund"]].apply(pd.to_numeric, downcast="float")
df_kiva_opti.dtypes
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\3400184989.py:3: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` df_kiva_opti.loc[:,["funded_amount","loan_amount","success_fund"]] = df_kiva_opti[["funded_amount","loan_amount","success_fund"]].apply(pd.to_numeric, downcast="float")
funded_amount float32 loan_amount float32 activity category sector category use category country_code category country category region category currency category term_in_months int32 lender_count int64 borrower_genders category repayment_interval category Anzahl_Genders int64 bg_female int64 bg_male int64 success_fund float32 continent category dtype: object
# Optimierung des Speicherplatzes bei den ints
df_kiva_opti.loc[:,["term_in_months","lender_count","Anzahl_Genders","bg_male","bg_female"]] = df_kiva_opti[["term_in_months","lender_count","Anzahl_Genders","bg_male","bg_female"]].apply(pd.to_numeric, downcast="integer")
df_kiva_opti.dtypes
C:\Users\alfa\AppData\Local\Temp\ipykernel_3240\164655627.py:3: DeprecationWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` df_kiva_opti.loc[:,["term_in_months","lender_count","Anzahl_Genders","bg_male","bg_female"]] = df_kiva_opti[["term_in_months","lender_count","Anzahl_Genders","bg_male","bg_female"]].apply(pd.to_numeric, downcast="integer")
funded_amount float32 loan_amount float32 activity category sector category use category country_code category country category region category currency category term_in_months int16 lender_count int16 borrower_genders category repayment_interval category Anzahl_Genders int8 bg_female int8 bg_male int8 success_fund float32 continent category dtype: object
# Reduzierung des Speicherplatzes
reduction = (df_final.memory_usage(deep=True).sum()-df_kiva_opti.memory_usage(deep=True).sum())/df_final.memory_usage(deep=True).sum()
print(f"{reduction:0.2f}")
0.80
Ergebnis:
Wir haben unseren Datensatz um mehr als die Hälfte reduziert.
df_kiva_opti
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | success_fund | continent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 | 100.0 | Asia |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 | 100.0 | Asia |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43 | 6 | female | bullet | 1 | 1 | 0 | 100.0 | Asia |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11 | 8 | female | irregular | 1 | 1 | 0 | 100.0 | Asia |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14 | 16 | female | monthly | 1 | 1 | 0 | 100.0 | Asia |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | South America |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 1 | female | monthly | 1 | 1 | 0 | 100.0 | Africa |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13 | 0 | Unknown | monthly | 0 | 0 | 0 | 0.0 | Africa |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | Africa |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | Africa |
671205 rows × 18 columns
# unsere df csv zwischenspeichern, um diese für die visualiserung zu nutzen
df_kiva_opti.to_csv('kiva_after_preprocessing.csv',sep=',')
# alternativ: to_pickle Vorteil: Abspeicherung als DataFrame inkl. seiner Eigenschaften zum Speicherplatz
# Nachteil: wenn Versionen im Team nicht stimmen, kann der andere den DF nicht als pickle einlesen
df_kiva_opti.to_pickle('kiva_after_preprocessing.pkl')
Das Ziel einer Crowdfunding-Investitionsplattform besteht darin, eine Online-Plattform bereitzustellen, auf der Investoren Geldmittel für verschiedene Projekte oder Unternehmen bereitstellen können. Crowdfunding ist eine Methode der Finanzierung, bei der eine große Anzahl von Menschen (die "Crowd") kleine Geldbeträge beiträgt, um ein größeres Finanzierungsziel zu erreichen.
-- Die KPI (Key Performance Indicators) für diese Crowdinvesting-Plattform könnten wie folgt aussehen:
Finanzielles Ziel: Der Prozentsatz des Zielbetrags (loan_amount), der durch das Crowdfunding erreicht wurde. Dies gibt an, wie erfolgreich die Projekte darin waren, das erforderliche Geld einzusammeln.
Erfolgreich finanzierte Projekte: Die Anzahl der Projekte, die den Zielbetrag erreicht oder überschritten haben. Dies zeigt, wie viele Geschäftsideen tatsächlich erfolgreich finanziert wurden.
Investorenanzahl: Die Anzahl der Personen (lender_count), die Geld in die Projekte investiert haben. Eine höhere Investorenanzahl deutet auf ein größeres Interesse der Anleger an den Projekten hin.
Geschlechterdiversität: Die Verteilung der Darlehensnehmer nach Geschlecht. Dies zeigt, wie gut die Plattform die Geschlechtervielfalt bei den Initiatoren der Projekte fördert.
Rückzahlungsrate: Der Prozentsatz der Darlehen, die gemäß dem vereinbarten Rückzahlungsintervall (repayment interval) pünktlich zurückgezahlt wurden. Eine hohe Rückzahlungsrate deutet darauf hin, dass die Projekte finanziell erfolgreich waren und die Investoren ihr Geld zurückerhalten.
Geografische Reichweite: Die Anzahl der Länder (country), aus denen Projekte finanziert wurden, und die Anzahl der Regionen (region) innerhalb dieser Länder. Eine breite geografische Reichweite zeigt die globale Präsenz und Akzeptanz der Plattform an.
Sektorverteilung: Die Verteilung der Projekte nach Sektoren (sector) und Unterkategorien (activity). Dies zeigt, in welchen Branchen und Themen die meisten Projekte finanziert wurden und wo das größte Interesse der Investoren liegt.
Diese KPIs ermöglichen es, den Erfolg und die Leistung der Crowdinvesting-Plattform zu messen und Verbesserungsbereiche zu identifizieren, um die Zufriedenheit der Projektinitiatoren und Investoren zu steigern.
import pandas as pd
import plotly.express as px
import numpy as np
# datensatz laden mit pickle
df_kiva_visual = pd.read_pickle('kiva_after_preprocessing.pkl')
df_kiva_visual.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | success_fund | continent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 | 100.0 | Asia |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 | 100.0 | Asia |
--> ein Verständnis dafür zu entwickeln, wie weit verbreitet und anerkannt die Kiva-Plattform auf globaler Ebene ist
--> Erkenntnisse über die globale Präsenz und Akzeptanz von Kiva zu gewinnen, um fundierte Entscheidungen zur Weiterentwicklung und Skalierung der Plattform treffen zu können
df_geo = df_kiva_visual.groupby(by=["continent","country"]).agg(count=("country","count")).reset_index()
df_1 = df_geo.loc[df_geo["count"]>0,:]
df_1
| continent | country | count | |
|---|---|---|---|
| 0 | Africa | Afghanistan | 2 |
| 5 | Africa | Benin | 497 |
| 9 | Africa | Burkina Faso | 2460 |
| 10 | Africa | Burundi | 880 |
| 12 | Africa | Cameroon | 2230 |
| ... | ... | ... | ... |
| 455 | South America | Ecuador | 13521 |
| 461 | South America | Guatemala | 7310 |
| 492 | South America | Paraguay | 11903 |
| 493 | South America | Peru | 22233 |
| 505 | South America | Suriname | 223 |
87 rows × 3 columns
data = {'Category': ['Total Countries', 'Kiva Countries'],
'Count': [195, 87]}
df = pd.DataFrame(data)
fig = px.pie(df, values='Count', names='Category', color='Category',title='Country Distribution')
fig.update_layout(legend_traceorder="normal",
legend_title_text='Number of Countries',
title_x=0.45
)
fig.update_traces(hoverinfo='percent + value',
textinfo='label + percent',
insidetextorientation='horizontal',
textposition='inside' ,
rotation=0
)
fig.show()
import plotly.express as px
plot_1 = px.choropleth(data_frame=df_1,
locations="country",
locationmode="country names",
color="count",
hover_name="continent",
projection="natural earth",
color_continuous_scale="Oranges",
title="Die Verteilung des Länder auf Kiva Plattform"
)
plot_1.update_layout(title_x=0.45)
plot_1.layout.legend.title = "Nr of projects"
plot_1.show()
Auswertung
"Sollzustand"
Um den Sollzustand zu ermitteln, ist die Frage die genauen Erwartungen auf die Anzahl der Länder, die in das Projekt involviert sind, im Laufe der Zeit wächst relevant.
Die Erwartung könnte daher sein, dass die Kiva-Plattform eine breite geografische Abdeckung hat und Länder auf verschiedenen Kontinenten einbezieht
"Vergleich Soll- und Istzustand"
Was ist mein Ziel?
Diese Länder identifizieren, die einen großen Anteil an fAnanzierten Projekten bzw. Beträgen auf der Kiva-Plattform haben.
Dies kann uns dabei helfen, sich auf diese Länder zu konzentrieren und gezielt Maßnahmen zu ergreifen, um das Wachstumspotenzial in diesen Ländern zu verstehen.
df_country = df_final.groupby(by=["continent","country","sector","activity"],as_index=False).agg(count_project=("activity","count"),total_investment=("funded_amount","sum"))
df_country.sort_values(by=["count_project","total_investment"],ascending=False)
| continent | country | sector | activity | count_project | total_investment | |
|---|---|---|---|---|---|---|
| 3619 | Asia | Philippines | Retail | General Store | 42960 | 15025550.0 |
| 506 | Africa | Kenya | Agriculture | Farming | 20555 | 10745600.0 |
| 3541 | Asia | Philippines | Agriculture | Pigs | 19985 | 6039975.0 |
| 2290 | Asia | Cambodia | Personal Use | Home Appliances | 16362 | 2731325.0 |
| 3537 | Asia | Philippines | Agriculture | Farming | 11924 | 4063150.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 2866 | Asia | Jordan | Services | Cleaning Services | 1 | 0.0 |
| 4952 | North America | Mexico | Retail | Mobile Phones | 1 | 0.0 |
| 5187 | North America | Puerto Rico | Services | Energy | 1 | 0.0 |
| 5297 | North America | Virgin Islands | Retail | General Store | 1 | 0.0 |
| 5298 | North America | Virgin Islands | Services | Services | 1 | 0.0 |
6270 rows × 6 columns
df_country_project = df_country.groupby(by="country").agg(sum_project=("count_project","sum")).reset_index()
df_country_project.sort_values(by="sum_project",ascending=False)
| country | sum_project | |
|---|---|---|
| 59 | Philippines | 160441 |
| 34 | Kenya | 75825 |
| 22 | El Salvador | 39875 |
| 11 | Cambodia | 34836 |
| 54 | Pakistan | 26857 |
| ... | ... | ... |
| 83 | Virgin Islands | 2 |
| 0 | Afghanistan | 2 |
| 18 | Cote D'Ivoire | 1 |
| 25 | Guam | 1 |
| 44 | Mauritania | 1 |
87 rows × 2 columns
plot_2 = px.bar(data_frame=df_country_project,
x='country',
y='sum_project',
title='The distribution of funded projects',
labels={'country':'Countries','sum_project':'Number of Projects'},
#size='total_investment',
color='sum_project',
height=800,template="simple_white"
)
plot_2.update_traces(marker_line_width=1)
# es scheint als ob es eine Grenze bei 50k gibt --> daher lege ich eine vertical line an diese Stelle
plot_2.add_shape(type="line",
x0=0,
y0=20000,
x1=len(df_country_project['country']),
y1=20000,
line=dict(color="black", width=3, dash="dash"))
'''
plot_2.add_shape(type="line",
x0=0,
y0=40000,
x1=len(df_country_invest['country']),
y1=40000,
line=dict(color="black", width=3, dash="dash"))
'''
plot_2.update_layout(title_x=0.45)
plot_2.update_xaxes(categoryorder="total ascending")
plot_2.show()
# Top-3-Länder mit den meisten finanzierten Projekten:
df_country_project.sort_values(by="sum_project",ascending=False,inplace=True)
df_country_project.head(3)
| country | sum_project | |
|---|---|---|
| 59 | Philippines | 160441 |
| 34 | Kenya | 75825 |
| 22 | El Salvador | 39875 |
# Top-3-Länder mit relativen Sektor
df_philippine = df_country.loc[df_country["country"]=="Philippines",["country","sector","activity","total_investment"]]
df_2 = df_philippine.groupby(by="sector").agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
#print(df_2)
df_kenya = df_country.loc[df_country["country"]=="Kenya",["country","sector","activity","total_investment"]]
df_3 = df_kenya.groupby(by="sector").agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
#print(df_3)
df_ElSalvador = df_country.loc[df_country["country"]=="El Salvador",["country","sector","activity","total_investment"]]
df_4 = df_ElSalvador.groupby(by="sector").agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
#print(df_4)
import plotly.graph_objects as go
# Data for the pie charts
sizes_1 = df_2["count_activity"] # Sizes for pie chart 1
sizes_2 = df_3["count_activity"] # Sizes for pie chart 2
sizes_3 = df_4["count_activity"] # Sizes for pie chart 3
labels = ['Agriculture', 'Arts', 'Clothing', 'Construction', 'Education',
'Entertainment', 'Food', 'Health', 'Housing', 'Manufacturing',
'Personal Use', 'Retail', 'Services', 'Transportation',
'Wholesale'] # Labels for all pie charts
# Sort sizes and labels in descending order
sizes_1, labels = zip(*sorted(zip(sizes_1, labels), reverse=True))
sizes_2, _ = zip(*sorted(zip(sizes_2, labels), reverse=True))
sizes_3, _ = zip(*sorted(zip(sizes_3, labels), reverse=True))
# Create subplots with 1 row and 3 columns
fig = go.Figure()
# Add the first pie chart
fig.add_trace(go.Pie(
labels=labels,
values=sizes_1,
name="Philippine",
hole=0.3,
domain={"x": [0, 0.25]},
direction="clockwise",
rotation=0
))
# Add the second pie chart
fig.add_trace(go.Pie(
labels=labels,
values=sizes_2,
name="Kenya",
hole=0.3,
domain={"x": [0.35, 0.60]},
direction="clockwise",
rotation=0
))
# Add the third pie chart
fig.add_trace(go.Pie(
labels=labels,
values=sizes_3,
name="El Salvador",
hole=0.3,
domain={"x": [0.70, 0.95]},
direction="clockwise",
rotation=0
))
# Update layout and show the figure
fig.update_layout(
title="Top 3 Countries with the most invested projects in 15 sectors",
showlegend=True,
title_x=0.45,
annotations=[
dict(text="Philippine", x=0.09, y=1.1, showarrow=False),
dict(text="Kenya", x=0.48, y=1.1, showarrow=False),
dict(text="El Salvador", x=0.90, y=1.1, showarrow=False)
]
)
fig.show()
# Um den signifikanten Sektor in Bezug auf die Aktivitäten in den Ländern zu bestimmen
df_2a = df_philippine.groupby(by=["sector","activity"]).agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
df_2a.sort_values(by=["sum_investment"],ascending=False)
df_3a = df_kenya.groupby(by=["sector","activity"]).agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
df_3a.sort_values(by=["sum_investment"],ascending=False)
df_4a = df_ElSalvador.groupby(by=["sector","activity"]).agg(count_activity=("activity","count"),sum_investment=("total_investment","sum")).reset_index()
df_4a.sort_values(by="sum_investment",ascending=False)
| sector | activity | count_activity | sum_investment | |
|---|---|---|---|---|
| 62 | Housing | Personal Housing Expenses | 1 | 2939750.0 |
| 0 | Agriculture | Agriculture | 1 | 2610050.0 |
| 49 | Food | Food Production/Sales | 1 | 2463925.0 |
| 86 | Retail | General Store | 1 | 1914125.0 |
| 7 | Agriculture | Farming | 1 | 1510900.0 |
| ... | ... | ... | ... | ... |
| 54 | Food | Pub | 1 | 875.0 |
| 17 | Arts | Musical Instruments | 1 | 800.0 |
| 70 | Personal Use | Funerals | 1 | 500.0 |
| 136 | Wholesale | Wholesale | 1 | 500.0 |
| 105 | Retail | Sporting Good Sales | 1 | 300.0 |
137 rows × 4 columns
# Die Beziehung zwischen Sektors und Aktivitäte in Philippines
fig_philippine = px.scatter(df_2a, x='sector', y='activity', color='count_activity',
size='sum_investment', hover_data=['count_activity', 'sum_investment'],
title='Relationship between Sectors and Activities',
labels={'sector': 'Sector', 'activity': 'Activity'}
)
fig_philippine.update_layout(
xaxis=dict(title='Sectors', categoryorder='total descending'),
yaxis=dict(title='Activities'),
title='Phillipines', title_x=0.45,
legend=dict(title='Nr of activities')
)
fig_philippine.show()
# Die Beziehung zwischen Sektors und Aktivitäte in Kenya
fig_kenya = px.scatter(df_3a, x='sector', y='activity', color='count_activity',
size='sum_investment', hover_data=['count_activity', 'sum_investment'],
title='Relationship between Sectors and Activities',
labels={'sector': 'Sector', 'activity': 'Activity'}
)
fig_kenya.update_layout(
xaxis=dict(title='Sectors', categoryorder='total descending'),
yaxis=dict(title='Activities'),
title='Kenya', title_x=0.45,
legend=dict(title='Nr of activities')
)
fig_kenya.show()
# Die Beziehung zwischen Sektors und Aktivitäte in El Salvardor
fig_elsavaldor = px.scatter(df_4a, x='sector', y='activity', color='count_activity',
size='sum_investment', hover_data=['count_activity', 'sum_investment'],
title='Relationship between Sectors and Activities',
labels={'sector': 'Sector', 'activity': 'Activity'}
)
fig_elsavaldor.update_layout(
xaxis=dict(title='Sectors', categoryorder='total descending'),
yaxis=dict(title='Activities'),
title='EL Salvardor', title_x=0.45,
legend=dict(title='Nr of activities')
)
fig_elsavaldor.show()
--> Die top 3 Länder zu identifizieren,die die höchste Anzahl an Projekten aufweisen : Philippines, Kenya und EL Salvardor
--> Wie sieht diese Länder, die einen Großteil der finanzierten Projekt ausmachen?
--> Berechnen wir den Anteil jedes Landes an der Gesamtzahl der finanzierten Projekte
--> Um den signifikanten Sektor in Bezug auf die Aktivitäten in den Ländern zu bestimmen
--> möglicherweise Muster, Trends oder Merkmale identifizieren, die auf den Erfolg oder das Potenzial der Länder
"Sollzustand"
"Vergleich Soll- und Istzustand"
--> Mögliche Abweichungen könnten sein:
In der Grafik ist zu sehen, dass die Top-3-Länder eingene signifikanten Sektor in Bezug auf die Aktivitäten hat.
Z.B:
--> Philippines: die großanteil investierte Projekte auf Retail Sektor in General Store, Elektronik Güte und Sales Aktivität liegt
--> Kenya: die großanteil investierte Projekte auf Agricultur in Farming und Food Produktion Aktivität liegt
--> El Salvador: die großanteil investierte Projekte nicht nur auf Agricultur, Food sondern auch Retail, Housing Sektor in Fishing, Housing Personal Expense, Food Production und General Store Aktivitäte
Dass bedeutet:
--> Philippines: der Einzelhandel in den Philippinen eine bedeutende Rolle bei den von Kiva finanzierten Projekten spielt
--> Kenya: die landwirtschaftliche Entwicklung und Nahrungsmittelproduktion in Kenia durch die Kiva-Plattform unterstützt werden
--> El Salvador: Dies zeigt, dass El Salvador in verschiedenen Bereichen von der Kiva-Plattform unterstützt wird und eine breitere Palette von wirtschaftlichen Aktivitäten aufweist.
Um nun stellt sich die Frage, zu welcher Kategorie die Top-3-Länder gehören?
--> Die Kategorien der Länder zu definieren
--> Die Kiva-Länder in die definierten Kategorien zu gruppieren, um eine klare Strukturierung zu erreichen
--> Durch die Kategorisierung der Länder eine bessere Strukturierung und Analyse der Daten zu ermöglichen. Dadurch können wertvolle Erkenntnisse gewonnen werden, die für Entscheidungen und Handlungen im Hinblick auf attraktive Investitionsmöglichkeiten relevant sein können
Mithilfe von ChatGPT und unter Verwendung der wichtigsten Merkmale/Eigenschaften (Kontinente, Wirtschaftliche Entwicklung, Bevölkerungszahl, Politisches System, Geografische Merkmale, Einkommensniveau, Branchen oder Sektoren von Landwirtschaft, Industrie, Dienstleistungen oder Technologie) der drei führenden Länder habe ich die Komplexität der Kategorisierung der Länder in fünf Kategorien definiert:
# Mapping 5 Kategories für alle Länder
country_categories = {
'Afghanistan': 'Wirtschaftliches Potenzial',
'Albania': 'Wachstumschancen',
'Armenia': 'Rohstoffe',
'Azerbaijan': 'Infrastrukturprojekte',
'Belize': 'Technologische Entwicklung',
'Benin': 'Wachstumschancen',
'Bhutan': 'Wirtschaftliches Potenzial',
'Bolivia': 'Rohstoffe',
'Brazil': 'Wirtschaftliches Potenzial',
'Burkina Faso': 'Wachstumschancen',
'Burundi': 'Wachstumschancen',
'Cambodia': 'Technologische Entwicklung',
'Cameroon': 'Rohstoffe',
'Chile': 'Wirtschaftliches Potenzial',
'China': 'Wirtschaftliches Potenzial',
'Colombia': 'Rohstoffe',
'Congo': 'Rohstoffe',
'Costa Rica': 'Infrastrukturprojekte',
"Cote D'Ivoire": 'Wachstumschancen',
'Dominican Republic': 'Infrastrukturprojekte',
'Ecuador': 'Rohstoffe',
'Egypt': 'Wirtschaftliches Potenzial',
'El Salvador': 'Wirtschaftliches Potenzial',
'Georgia': 'Wachstumschancen',
'Ghana': 'Rohstoffe',
'Guam': 'Technologische Entwicklung',
'Guatemala': 'Infrastrukturprojekte',
'Haiti': 'Wirtschaftliches Potenzial',
'Honduras': 'Wachstumschancen',
'India': 'Wirtschaftliches Potenzial',
'Indonesia': 'Wachstumschancen',
'Iraq': 'Rohstoffe',
'Israel': 'Technologische Entwicklung',
'Jordan': 'Wachstumschancen',
'Kenya': 'Wachstumschancen',
'Kosovo': 'Wirtschaftliches Potenzial',
'Kyrgyzstan': 'Wachstumschancen',
"Lao People's Democratic Republic": 'Technologische Entwicklung',
'Lebanon': 'Infrastrukturprojekte',
'Lesotho': 'Wachstumschancen',
'Liberia': 'Wachstumschancen',
'Madagascar': 'Rohstoffe',
'Malawi': 'Wachstumschancen',
'Mali': 'Wirtschaftliches Potenzial',
'Mauritania': 'Wachstumschancen',
'Mexico': 'Wirtschaftliches Potenzial',
'Moldova': 'Wachstumschancen',
'Mongolia': 'Rohstoffe',
'Mozambique': 'Rohstoffe',
'Myanmar (Burma)': 'Wachstumschancen',
'Namibia': 'Wirtschaftliches Potenzial',
'Nepal': 'Wachstumschancen',
'Nicaragua': 'Wachstumschancen',
'Nigeria': 'Rohstoffe',
'Pakistan': 'Wirtschaftliches Potenzial',
'Palestine': 'Infrastrukturprojekte',
'Panama': 'Infrastrukturprojekte',
'Paraguay': 'Rohstoffe',
'Peru': 'Rohstoffe',
'Philippines': 'Technologische Entwicklung',
'Puerto Rico': 'Infrastrukturprojekte',
'Rwanda': 'Wachstumschancen',
'Saint Vincent and the Grenadines': 'Technologische Entwicklung',
'Samoa': 'Technologische Entwicklung',
'Senegal': 'Wachstumschancen',
'Sierra Leone': 'Wachstumschancen',
'Solomon Islands': 'Technologische Entwicklung',
'Somalia': 'Rohstoffe',
'South Africa': 'Rohstoffe',
'South Sudan': 'Wachstumschancen',
'Suriname': 'Rohstoffe',
'Tajikistan': 'Wachstumschancen',
'Tanzania': 'Wachstumschancen',
'Thailand': 'Technologische Entwicklung',
'The Democratic Republic of the Congo': 'Rohstoffe',
'Timor-Leste': 'Wachstumschancen',
'Togo': 'Wachstumschancen',
'Turkey': 'Wirtschaftliches Potenzial',
'Uganda': 'Wachstumschancen',
'Ukraine': 'Wachstumschancen',
'United States': 'Wirtschaftliches Potenzial',
'Vanuatu': 'Wachstumschancen',
'Vietnam': 'Wachstumschancen',
'Virgin Islands': 'Technologische Entwicklung',
'Yemen': 'Wachstumschancen',
'Zambia': 'Rohstoffe',
'Zimbabwe': 'Rohstoffe'
}
df_country_cat = df_kiva_visual.copy()
df_country_cat['Category'] = df_country_cat['country'].map(country_categories)
df_country_cat
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | Anzahl_Genders | bg_female | bg_male | success_fund | continent | Category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12 | 12 | female | irregular | 1 | 1 | 0 | 100.0 | Asia | Wirtschaftliches Potenzial |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11 | 14 | female, female | irregular | 2 | 2 | 0 | 100.0 | Asia | Wirtschaftliches Potenzial |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43 | 6 | female | bullet | 1 | 1 | 0 | 100.0 | Asia | Wirtschaftliches Potenzial |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11 | 8 | female | irregular | 1 | 1 | 0 | 100.0 | Asia | Wirtschaftliches Potenzial |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14 | 16 | female | monthly | 1 | 1 | 0 | 100.0 | Asia | Wirtschaftliches Potenzial |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | South America | Rohstoffe |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 1 | female | monthly | 1 | 1 | 0 | 100.0 | Africa | Wachstumschancen |
| 671202 | 0.0 | 25.0 | Games | Entertainment | Unknown | KE | Kenya | Unknown | KES | 13 | 0 | Unknown | monthly | 0 | 0 | 0 | 0.0 | Africa | Wachstumschancen |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | Africa | Wachstumschancen |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Unknown | KES | 13 | 0 | female | monthly | 1 | 1 | 0 | 0.0 | Africa | Wachstumschancen |
671205 rows × 19 columns
df_categories = df_country_cat.groupby(by=["Category","country","continent"]).agg(nr_project=("country","count")).reset_index()
#df_categories
df_categories.sort_values(by="nr_project",ascending=False)
df_6 = df_categories.loc[df_categories["nr_project"]>0,:]
df_6
| Category | country | continent | nr_project | |
|---|---|---|---|---|
| 19 | Infrastrukturprojekte | Azerbaijan | Asia | 1945 |
| 105 | Infrastrukturprojekte | Costa Rica | North America | 1561 |
| 117 | Infrastrukturprojekte | Dominican Republic | North America | 496 |
| 161 | Infrastrukturprojekte | Guatemala | South America | 7310 |
| 229 | Infrastrukturprojekte | Lebanon | Asia | 8792 |
| ... | ... | ... | ... | ... |
| 2361 | Wirtschaftliches Potenzial | Mexico | North America | 5741 |
| 2388 | Wirtschaftliches Potenzial | Namibia | Africa | 8 |
| 2413 | Wirtschaftliches Potenzial | Pakistan | Asia | 26857 |
| 2551 | Wirtschaftliches Potenzial | Turkey | Asia | 1703 |
| 2571 | Wirtschaftliches Potenzial | United States | North America | 6093 |
87 rows × 4 columns
# Anzeigen die Anzahl Länder des Kiva-Plattform für jede Gruppe
df_infra = df_country_cat.loc[df_country_cat["Category"]=="Infrastrukturprojekte",["country"]]
print(f"Infrastruktur:",df_infra["country"].unique().shape)
####
df_rohstoff = df_country_cat.loc[df_country_cat["Category"]=="Rohstoffe",["country"]]
print(f"Rohstoff:",df_rohstoff["country"].unique().shape)
###
df_tech = df_country_cat.loc[df_country_cat["Category"]=="Technologische Entwicklung",["country"]]
print(f"Technlogy:",df_tech["country"].unique().shape)
###
df_wachstum = df_country_cat.loc[df_country_cat["Category"]=="Wachstumschancen",["country"]]
print(f"Wachstum:",df_wachstum["country"].unique().shape)
###
df_wirtschaft = df_country_cat.loc[df_country_cat["Category"]=="Wirtschaftliches Potenzial",["country"]]
print(f"Wirtschaft:",df_wirtschaft["country"].unique().shape)
Infrastruktur: (8,) Rohstoff: (20,) Technlogy: (11,) Wachstum: (32,) Wirtschaft: (16,)
plot_3 = px.scatter(data_frame=df_6,
x='country',
y='nr_project',
title='Analysis of 87 Countries across 5 Categories',
labels={'Category':'Country Categories','nr_project':'Number of invested projects'},
color='Category'
)
plot_3.update_layout(legend=dict(title="Country Categories"),title_x=0.45)
plot_3.show()
plot_3a = px.choropleth(data_frame=df_6,
locations="country",
locationmode="country names",
color="Category",
hover_name="continent",
projection="natural earth",
color_continuous_scale="Oranges",
title="Analysis of 87 Countries across 5 Categories"
)
plot_3a.update_layout(title_x=0.45)
plot_3a.layout.legend.title = "Country Categories"
plot_3a.show()
Durch die farbliche Visualisierung von Scatter- und Choropleth-Plots werden die 5 Kategorien und ihre Verteilung auf der Kiva-Plattform deutlich dargestellt. Es wird deutlich sichtbar, dass die Anzahl der Länder auf der Kiva-Plattform nur einen kleinen Teil der Erde ausmacht, wie zuvor erwähnt (mit einem Anteil von 30,9%)
"Sollzustand"
"Vergleich Soll- und Istzustand"
Nächste Business Frage: In welchen Länder die Kiva-Plattform erfolgreich ist?
Basierend auf den vorangegangenen Daten soll ein Dashboard mit der Bibliothek Dash erstellt werden.
Das Dashboard soll:
- mindestens 1 HTML component enthalten
- 1 Dash Core Component mit dem Daten ausgewählt werden können
- 1 Grafik, die sich in Abhängigkeit von der Datenauswahl ändert
Verwende folgenden Code zum Ausführen der App:
my_app.run_server(mode='inline')
# für dash-app
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
# für die Visualisierung
import plotly.express as px
# Zusatz
from dash import no_update
#df_final
# 1. app erstellen
my_us_app = JupyterDash(__name__)
# 2. layout erstellen
my_us_app.layout = html.Div([
html.H1("Analysis of 87 Countries across 5 Categories",
style={"font-size": 30, "textAlign":"center"}
),
# Radio Items or 5 Country Categories
dcc.RadioItems(id="radio_select_categories",
options=sorted(df_6.loc[:,"Category"].unique()),
#style={"textAlign": "centergogor"},
style={"textAlign": "center"},
inline=True,
value="Wachstumschancen"
),
html.Label(children="Countries",
style={"font-size":20, "text-align":"center"}
),
dcc.Dropdown(
id="dd_country",
options=[],
#options=[{"label":s, "value":s} for s in sorted(df_6.loc[:,"country"].unique())],
#value="",
clearable=True,
multi=True,
style={"width":1500}
),
# Dropdown für die Auswahl des Counties
# Graph
dcc.Graph(
id="my_scatter",
figure={}
)
])
# 3. callbacks erstellen
# callback 1 erstellen: Auswahl des States
@my_us_app.callback(
Output(component_id="dd_country", component_property="options"),
Input(component_id="radio_select_categories", component_property="value")
)
def set_country(chosen_category):
# Daten selektieren
df_country = df_6.loc[df_6["Category"]==chosen_category]
return [{"label":s, "value":s} for s in sorted(df_country.loc[:,"country"].unique())]
# callback 2 erstellen: zur Verfügung stehende countries sollen als Vorauswahl aufgefüllt werden
@my_us_app.callback(
Output(component_id="dd_country", component_property="value"),
Input(component_id="dd_country", component_property="options")
)
def set_country(available_countries):
return [country["value"] for country in available_countries]
# callback 3 erstellen: Grafik generieren
@my_us_app.callback(
Output(component_id="my_scatter", component_property="figure"),
Input(component_id="radio_select_categories", component_property="value"),
Input(component_id="dd_country", component_property="value")
)
def update_scatter(chose_categories, chosen_country):
if len(chosen_country)==0:
return no_update
else:
# Daten selektieren
df = df_6.loc[(df_6["Category"]==chose_categories) & (df_6["country"].isin(chosen_country)),["Category","country","continent"]]
# Grafik erstellen
my_scatter = px.choropleth(data_frame=df,
locations="country",
locationmode="country names",
color="Category",
hover_name="continent",
projection="natural earth",
color_continuous_scale="Oranges",
title="Analysis of 87 Countries across 5 Categories"
)
#my_scatter.update_traces(marker=dict(size=12))
my_scatter.update_layout(title_x=0.45)
my_scatter.layout.legend.title = "Country Categories"
return my_scatter
# 4. app ausführen
if __name__ == '__main__':
my_us_app.run_server(mode='inline', port=8091)
Dash is running on http://127.0.0.1:8091/